AITopics | financial domain

Collaborating Authors

financial domain

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FinBen: A Holistic Financial Benchmark for Large Language Models

Neural Information Processing SystemsFeb-17-2026, 10:09:05 GMT

Their novel solutions outperformed GPT -4, showcasing FinBen's potential to drive innovations in financial LLMs.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia (0.04)
(14 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Government (1.00)
Banking & Finance > Trading (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dissecting the Ledger: Locating and Suppressing "Liar Circuits" in Financial Large Language Models

Mirajkar, Soham

arXiv.org Artificial IntelligenceDec-1-2025

Large Language Models (LLMs) are increasingly deployed in high-stakes financial domains, yet they suffer from specific, reproducible hallucinations when performing arithmetic operations. Current mitigation strategies often treat the model as a black box. In this work, we propose a mechanistic approach to intrinsic hallucination detection. By applying Causal Tracing to the GPT-2 XL architecture on the ConvFinQA benchmark, we identify a dual-stage mechanism for arithmetic reasoning: a distributed computational scratchpad in middle layers (L12-L30) and a decisive aggregation circuit in late layers (specifically Layer 46). We verify this mechanism via an ablation study, demonstrating that suppressing Layer 46 reduces the model's confidence in hallucinatory outputs by 81.8%. Furthermore, we demonstrate that a linear probe trained on this layer generalizes to unseen financial topics with 98% accuracy, suggesting a universal geometry of arithmetic deception.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.21756

Genre: Research Report (0.66)

Industry: Banking & Finance (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Methodology for Assessing the Risk of Metric Failure in LLMs Within the Financial Domain

Flanagan, William, Das, Mukunda, Ramanayake, Rajitha, Maslekar, Swanuja, Mangipudi, Meghana, Choi, Joong Ho, Nair, Shruti, Bhusan, Shambhavi, Dulam, Sanjana, Pendharkar, Mouni, Singh, Nidhi, Doshi, Vashisth, Paresh, Sachi Shah

arXiv.org Artificial IntelligenceOct-17-2025

As Generative Artificial Intelligence is adopted across the financial services industry, a significant barrier to adoption and usage is measuring model performance. Historical machine learning metrics can oftentimes fail to generalize to GenAI workloads and are often supplemented using Subject Matter Expert (SME) Evaluation. Even in this combination, many projects fail to account for various unique risks present in choosing specific metrics. Additionally, many widespread benchmarks created by foundational research labs and educational institutions fail to generalize to industrial use. This paper explains these challenges and provides a Risk Assessment Framework to allow for better application of SME and machine learning Metrics

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.13524

Country: North America > United States (1.00)

Genre: Research Report (0.84)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Insurance (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Add feedback

FinBen: A Holistic Financial Benchmark for Large Language Models

Neural Information Processing SystemsOct-10-2025, 13:05:29 GMT

Their novel solutions outperformed GPT -4, showcasing FinBen's potential to drive innovations in financial LLMs.

arxiv preprint arxiv, dataset, llm, (14 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia (0.04)
(14 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Government (1.00)
Banking & Finance > Trading (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CBP-Tuning: Efficient Local Customization for Black-box Large Language Models

Zhao, Jiaxuan, Gu, Naibin, Feng, Yuchen, Liu, Xiyu, Fu, Peng, Lin, Zheng, Wang, Weiping

arXiv.org Artificial IntelligenceSep-16-2025

The high costs of customizing large language models (LLMs) fundamentally limit their adaptability to user-specific needs. Consequently, LLMs are increasingly offered as cloud-based services, a paradigm that introduces critical limitations: providers struggle to support personalized customization at scale, while users face privacy risks when exposing sensitive data. To address this dual challenge, we propose Customized Black-box Prompt Tuning (CBP-Tuning), a novel framework that facilitates efficient local customization while preserving bidirectional privacy. Specifically, we design a two-stage framework: (1) a prompt generator trained on the server-side to capture domain-specific and task-agnostic capabilities, and (2) user-side gradient-free optimization that tailors soft prompts for individual tasks. This approach eliminates the need for users to access model weights or upload private data, requiring only a single customized vector per task while achieving effective adaptation. Furthermore, the evaluation of CBP-Tuning in the commonsense reasoning, medical and financial domain settings demonstrates superior performance compared to baselines, showcasing its advantages in task-agnostic processing and privacy preservation.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.12112

Country:

Asia (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.66)
Transportation > Air (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment

Cheng, Gang, Jin, Haibo, Zhang, Wenbin, Wang, Haohan, Zhuang, Jun

arXiv.org Artificial IntelligenceSep-16-2025

Large Language Models (LLMs) are increasingly integrated into financial applications, yet existing red-teaming research primarily targets harmful content, largely neglecting regulatory risks. In this work, we aim to investigate the vulnerability of financial LLMs through red-teaming approaches. We introduce Risk-Concealment Attacks (RCA), a novel multi-turn framework that iteratively conceals regulatory risks to provoke seemingly compliant yet regulatory-violating responses from LLMs. To enable systematic evaluation, we construct FIN-Bench, a domain-specific benchmark for assessing LLM safety in financial contexts. Extensive experiments on FIN-Bench demonstrate that RCA effectively bypasses nine mainstream LLMs, achieving an average attack success rate (ASR) of 93.18%, including 98.28% on GPT-4.1 and 97.56% on OpenAI o1. These findings reveal a critical gap in current alignment techniques and underscore the urgent need for stronger moderation mechanisms in financial domains. We hope this work offers practical insights for advancing robust and domain-aware LLM alignment.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.10546

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)
Banking & Finance > Trading (1.00)
Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

FinGAIA: A Chinese Benchmark for AI Agents in Real-World Financial Domain

Zeng, Lingfeng, Lou, Fangqi, Wang, Zixuan, Xu, Jiajie, Niu, Jinyi, Li, Mengping, Dong, Yifan, Qi, Qi, Zhang, Wei, Yang, Ziwei, Han, Jun, Feng, Ruilun, Hu, Ruiqi, Zhang, Lejie, Feng, Zhengbo, Ren, Yicheng, Guo, Xin, Liu, Zhaowei, Cheng, Dongpo, Cai, Weige, Zhang, Liwen

arXiv.org Artificial IntelligenceAug-1-2025

The booming development of AI agents presents unprecedented opportunities for automating complex tasks across various domains. However, their multi-step, multi-tool collaboration capabilities in the financial sector remain underexplored. This paper introduces FinGAIA, an end-to-end benchmark designed to evaluate the practical abilities of AI agents in the financial domain. FinGAIA comprises 407 meticulously crafted tasks, spanning seven major financial sub-domains: securities, funds, banking, insurance, futures, trusts, and asset management. These tasks are organized into three hierarchical levels of scenario depth: basic business analysis, asset decision support, and strategic risk management. We evaluated 10 mainstream AI agents in a zero-shot setting. The best-performing agent, ChatGPT, achieved an overall accuracy of 48.9\%, which, while superior to non-professionals, still lags financial experts by over 35 percentage points. Error analysis has revealed five recurring failure patterns: Cross-modal Alignment Deficiency, Financial Terminological Bias, Operational Process Awareness Barrier, among others. These patterns point to crucial directions for future research. Our work provides the first agent benchmark closely related to the financial domain, aiming to objectively assess and promote the development of agents in this crucial field. Partial data is available at https://github.com/SUFE-AIFLM-Lab/FinGAIA.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.17186

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

Pang, Bo, Ouyang, Yalu, Xu, Hangfei, Jia, Ziqi, Li, Panpan, Wen, Shengzhao, Wang, Lu, Li, Shiyong, Wang, Yanpeng

arXiv.org Artificial IntelligenceJul-10-2025

Advancements in reasoning for large language models (LLMs) have lead to significant performance improvements for LLMs in various fields such as mathematics and programming. However, research applying these advances to the financial domain, where considerable domain-specific knowledge is necessary to complete tasks, remains limited. To address this gap, we introduce FEVO (Financial Evolution), a multi-stage enhancement framework developed to enhance LLM performance in the financial domain. FEVO systemically enhances LLM performance by using continued pre-training (CPT) to expand financial domain knowledge, supervised fine-tuning (SFT) to instill structured, elaborate reasoning patterns, and reinforcement learning (RL) to further integrate the expanded financial domain knowledge with the learned structured reasoning . To ensure effective and efficient training, we leverage frontier reasoning models and rule-based filtering to curate FEVO-Train, high-quality datasets specifically designed for the different post-training phases. Using our framework, we train the FEVO series of models--C32B, S32B, R32B--from Qwen2.5-32B and evaluate them on seven benchmarks to assess financial and general capabilities, with results showing that FEVO-R32B achieves state-of-the-art performance on five financial benchmarks against much larger models as well as specialist models. More significantly, FEVO-R32B demonstrates markedly better performance than FEVO-R32B-0 (trained from Qwen2.5-32B-Instruct using only RL), thus validating the effectiveness of financial domain knowledge expansion and structured, logical reasoning distillation . Recent studies in large language models (LLMs) have lead to widespread popularity of enhancing model capabilities via long Chain-Of-Thought (CoT) reasoning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.06057

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

Li, Jiangtong, Zhu, Yiyun, Cheng, Dawei, Ding, Zhijun, Jiang, Changjun

arXiv.org Artificial IntelligenceJun-17-2025

Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs) and are now applied in various fields. In finance, the integration of diverse modalities such as text, charts, and tables is crucial for accurate and efficient decision-making. Therefore, an effective evaluation system that incorporates these data types is essential for advancing financial application. In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams. Additionally, we develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step. Despite MLLMs having inherent financial knowledge, experimental results still show limited efficiency and robustness in handling multimodal financial context. Further analysis on incorrect responses reveals the misinterpretation of visual content and the misunderstanding of financial concepts are the primary issues. Our research validates the significant, yet underexploited, potential of MLLMs in financial analysis, highlighting the need for further development and domain-specific optimization to encourage the enhanced use in financial domain.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.13055

Country: Asia > China (0.68)

Genre: Research Report > New Finding (0.93)

Industry:

Energy (1.00)
Banking & Finance > Trading (1.00)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

Luo, Junyu, Kou, Zhizhuo, Yang, Liming, Luo, Xiao, Huang, Jinsheng, Xiao, Zhiping, Peng, Jingshu, Liu, Chengzhong, Ji, Jiaming, Liu, Xuanzhe, Han, Sirui, Zhang, Ming, Guo, Yike

arXiv.org Artificial IntelligenceJun-2-2025

Multimodal Large Language Models (MLLMs) have experienced rapid development in recent years. However, in the financial domain, there is a notable lack of effective and specialized multimodal evaluation datasets. To advance the development of MLLMs in the finance domain, we introduce FinMME, encompassing more than 11,000 high-quality financial research samples across 18 financial domains and 6 asset classes, featuring 10 major chart types and 21 subtypes. We ensure data quality through 20 annotators and carefully designed validation mechanisms. Additionally, we develop FinScore, an evaluation system incorporating hallucination penalties and multi-dimensional capability assessment to provide an unbiased evaluation. Extensive experimental results demonstrate that even state-of-the-art models like GPT-4o exhibit unsatisfactory performance on FinMME, highlighting its challenging nature. The benchmark exhibits high robustness with prediction variations under different prompts remaining below 1%, demonstrating superior reliability compared to existing datasets. Our dataset and evaluation protocol are available at https://huggingface.co/datasets/luojunyu/FinMME and https://github.com/luo-junyu/FinMME.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.24714

Country:

Asia (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback